Disclosure Risk Measurement with Entropy in Two-Dimensional Sample Based Frequency Tables
نویسندگان
چکیده
We extend a disclosure risk measure defined for population based frequency tables to sample based frequency tables. The disclosure risk measure is based on information theoretical expressions, such as entropy and conditional entropy, that reflect the properties of attribute disclosure. To estimate the disclosure risk of a sample based frequency table we need to take into account the underlying population and therefore need both the population and sample frequencies. However, population frequencies might not be known and therefore they must be estimated from the sample. We consider two probabilistic models, a log-linear model and a so-called Pólya urn model, to estimate the population frequencies. Numerical results suggest that the Pólya urn model may be a feasible alternative to the log-linear model for estimating population frequencies and the disclosure risk measure.
منابع مشابه
Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables
Frequency tables disseminated by statistical agencies have always been of high interest. However, the agencies have to ensure that the risk of identifying individuals and disclosing individuals’ attributes from the released data is low. Therefore they assess the risk of disclosure and apply statistical disclosure control (SDC) methods if necessary. The main objective of this work is to measure ...
متن کاملRisk measurement and Implied volatility under Minimal Entropy Martingale Measure for Levy process
This paper focuses on two main issues that are based on two important concepts: exponential Levy process and minimal entropy martingale measure. First, we intend to obtain risk measurement such as value-at-risk (VaR) and conditional value-at-risk (CvaR) using Monte-Carlo methodunder minimal entropy martingale measure (MEMM) for exponential Levy process. This Martingale measure is used for the...
متن کاملA posteriori Disclosure Risk Measure for Tabular Data Based on Conditional Entropy∗
Statistical database protection, also known as Statistical Disclosure Control (SDC), is a part of information security which tries to prevent published statistical information (tables, individual records) from disclosing the contribution of specific respondents. This paper deals with the assessment of the disclosure risk associated to the release of tabular data. So-called sensitivity rules are...
متن کاملStatistical Disclosure Control Methods for Census Frequency Tables
This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...
متن کاملCell Suppression to Limit Content-Based Disclosure
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of conndential information. General logical and numerical methods exist to determine, prior to release, if disclosure can occur|either directly or through inference. One method uses linear programming techni...
متن کامل